AITopics | proximal-gradient method

Collaborating Authors

proximal-gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

Neural Information Processing SystemsMar-15-2024, 06:54:59 GMT

We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.

convergence rate, proximal-gradient method, proximity operator, (12 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition

Karimi, Hamed, Nutini, Julie, Schmidt, Mark

arXiv.org Machine LearningJun-11-2018

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence of these methods. Along the way, we give simple convergence results for a wide variety of problems in machine learning: least squares, logistic regression, boosting, resilient backpropagation, L1-regularization, support vector machines, stochastic dual coordinate ascent, and stochastic variance-reduced gradient methods.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

1608.04636

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems

Wang, Zhaoran, Liu, Han, Zhang, Tong

arXiv.org Machine LearningJan-26-2015

We provide theoretical analysis of the statistical and computational properties of penalized $M$-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path-following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence for any local solution attained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty.

assumption 4, convergence, probability, (16 more...)

arXiv.org Machine Learning

doi: 10.1214/14-AOS1238

1306.496

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.65)

Industry: Education > Focused Education > Special Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Fast Distributed Proximal-Gradient Method

Chen, Annie I., Ozdaglar, Asuman

arXiv.org Machine LearningOct-8-2012

We present a distributed proximal-gradient method for optimizing the average of convex functions, each of which is the private local objective of an agent in a network with time-varying topology. The local objectives have distinct differentiable components, but they share a common nondifferentiable component, which has a favorable structure suitable for effective computation of the proximal operator. In our method, each agent iteratively updates its estimate of the global minimum by optimizing its local objective function, and exchanging estimates with others via communication in the network. Using Nesterov-type acceleration techniques and multiple communication steps per iteration, we show that this method converges at the rate 1/k (where k is the number of communication rounds between the agents), which is faster than the convergence rate of the existing distributed methods for solving this problem. The superior convergence rate of our method is also verified by numerical experiments.

artificial intelligence, machine learning, proximal-gradient method, (15 more...)

arXiv.org Machine Learning

1210.2289

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

Schmidt, Mark, Roux, Nicolas L., Bach, Francis R.

Neural Information Processing SystemsDec-31-2011

We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the second term. We show that the basic proximal-gradient method, the basic proximal-gradient method with a strong convexity assumption, and the accelerated proximal-gradient method achieve the same convergence rates as in the error-free case, provided the errors decrease at an appropriate rate. Our experimental results on a structured sparsity problem indicate that sequences of errors with these appealing theoretical properties can lead to practical performance improvements.

artificial intelligence, machine learning, proximal-gradient method, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso

Chen, Xi, Kim, Seyoung, Lin, Qihang, Carbonell, Jaime G., Xing, Eric P.

arXiv.org Machine LearningMay-19-2010

We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as l1/l2-regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty function based on fusion penalty to encourage highly correlated outputs to share a common set of relevant inputs. In addition, we propose a simple yet efficient proximal-gradient method for optimizing GFlasso that can also be applied to any optimization problems with a convex smooth loss and the general class of fusion penalty defined on arbitrary graph structures. By exploiting the structure of the non-smooth ''fusion penalty'', our method achieves a faster convergence rate than the standard first-order method, sub-gradient method, and is significantly more scalable than the widely adopted second-order cone-programming and quadratic-programming formulations. In addition, we provide an analysis of the consistency property of the GFlasso model. Experimental results not only demonstrate the superiority of GFlasso over the standard lasso but also show the efficiency and scalability of our proximal-gradient method.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1005.3579

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)

Add feedback